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Abstract 

We propose a semidefinite programming (SDP) algorithm for community detection in the 
stochastic block model, a popular model for networks with latent community structure. We prove 
that our algorithm achieves exact recovery of the latent communities, up to the information- 
theoretic limits determined by Abbe and Sandon |AS15a] . Our result extends prior SDP ap¬ 
proaches by allowing for many communities of different sizes. By virtue of a semidefinite ap¬ 
proach, our algorithms succeed against a semirandom variant of the stochastic block model, 
guaranteeing a form of robustness and generalization. We further explore how semirandom 
models can lend insight into both the strengths and limitations of SDPs in this setting. 
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1 Introduction 


1.1 Community detection 

The stochastic block model is among the most actively studied models for networks with latent 
community structure. Such models lend themselves to the statistical task of recovering the com¬ 
munity structure given a single such graph, a task known as community detection. This task 
manifests itself as that of finding human communities in social networks, and that of determin¬ 
ing sets of proteins that function together in protein-protein interaction networks; more generally, 
community detection is the analogue of clustering for network data. 

Since the introduction of the stochastic block model by Holland |HLL83] . a wide range of 
algorithmic approaches have been deployed. Such algorithms are generally compared through the 
range of model parameters in which they succeed in recovering information; whether they achieve 
exact recovery of the underlying community structure, or only partial recovery of community 
structure correlated with the truth; and their computational efficiency. 

In recent years, the block model has received a surge of interest as a meeting point of machine 
learning, algorithms, and statistical mechanics. A series of conjectures by Decelle et al. [DKMZlT] 
predicted a sharp threshold, separating a range of model parameters in which partial recovery 
is possible from a range in which it is impossible, using non-rigorous techniques from statistical 
physics. Some of these threshold effects have been proven |MNS14[ IMasl4j . along with analogues 
for exact recovery [ABH16[ IAS15a] , and among the first algorithms to achieve exact recovery all 
the way up to these thresholds were those based on the powerful algorithmic tools of semidefinite 
programming (SDP). Indeed, the maximum likelihood estimation problems for block models amount 
to cut problems on graphs, which have a rich history of study through SDPs (for example [GW95( 
IFJ94] 1: it is natural for these to appear among the most powerful tools for community detection. 

In recent work of Abbe and Sandon [AS15a| . alternative methods of spectral clustering and 
local refinement have taken the lead in exact recovery, providing very general and sharp algorithmic 
results. It is natural to wonder how well SDP approaches can match these results, particularly in 
the light of strong robustness properties that convex programs enjoy. In this paper, we extend 
the frontier of optimal results on SDPs for exact recovery to the case of multiple communities of 
unequal sizes. We further address the relationship of SDPs to semirandom models, highlighting their 
robustness as an advantage over spectral techniques, but also exposing how this same robustness 
can pose fundamental limits to SDP approaches. 

1.2 Models 

1.2.1 Stochastic block model 

The stochastic block model is a generative model for graphs, in which we suppose a vertex set 
of size n has been partitioned into r disjoint ‘communities’ Si,..., Sr- An undirected graph is 
randomly generated from this hidden partition as follows: every unordered pair (u, v) of distinct 
vertices is independently connected by an edge with probability Qij where u, v belong to communi¬ 
ties i,j respectively. Here Q is a symmetric r x r matrix. Given a graph sampled from this model, 
the goal is to recover the underlying partition. 

Many papers specialize to the planted partition model, the case where Qa = p for all i and 
Qij = q for all i ^ j- We will largely work with this specialization, and occasionally discuss the 
more general model. 
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1.2.2 Semirandom models 

Much work in community detection focuses on the Erdos-Renyi-style random models defined above. 
These models are mathematically convenient, owing to the independence of edges, yet most real- 
world networks do not take this form. Moreover, there is cause for concern that some algorithms 
developed for the models above are highly brittle and do not generalize to other graph models: as 
illustrated in |RJM16| , one spectral method with strong theoretical guarantees degrades very rapidly 
when a sparse planted partition model is perturbed by adding small cliques, which are otherwise 
very infrequent in this random model, but do occur frequently in many real-world graphs. 

Extending beyond the random models above, Blum and Spencer |BS95| introduced the notion 
of semirandom models for graph problems, as an intermediate model between average-case 
and worst-case performance. The idea, extended by Eeige and Kilian [FK01| . is to generate a 
preliminary graph according to a random model, but then allow a monotone adversary to make 
unbounded, arbitrarily structured changes of a nature that should only help the algorithm by 
further revealing the ground truth. Although these changes may seem to make the problem easier, 
they may significantly alter the distribution of observations revealed to the algorithm, breaking 
statistical assumptions made about the observations. Semirandom models are thus a means of 
penalizing brittle algorithms that are over-tuned to particular random models. Semirandom models 
are no easier than their random counterparts, as the adversary may opt to make no changes. 

Following [EKnij . we define the semirandom planted partition model as follows. A graph 
is first generated according to the planted partition model, and then a monotone adversary may 
arbitrarily add edges within communities, and remove edges between communities. The semiran¬ 
dom model can simulate aspects of real-world graphs and other graph models, such as a wide range 
of degree or subgraph profiles, while ensuring that the true community structure remains present 
in the graph. Thus the semirandom model aims to capture the unpredictable nature of real data. 

An algorithm is called robust to monotone adversaries if, whenever it succeeds on a sample from 
the random model, it also succeeds after arbitrary monotone changes to that sample. Algorithms 
based on semidefinite programming (SDP) are typically robust, or can be modified to be robust, 
and essentially all known robust algorithms are based on convex programs such as SDPs. This 
property guarantees some ability of such algorithms to generalize to other models; by contrast, 
algorithms that over-exploit precise statistics of a random model will typically fail against similar 
random models with different statistics, and will also typically fail against a semirandom model. 

1.2.3 Regimes 

The stochastic block model admits two major regimes of study, along with other variants. The 
main distinction is between partial recovery, in which the goal is only to recover a partition that 
is reliably correlated with the true partition better than random guessing, versus exact recovery, 
in which the partition must be recovered perfectly. In partial recovery, one tends to take the within- 
community edge probability p and the between-community edge probability q to be 0(l/n), whereas 
in exact recovery one takes them to be 0(logre/n). In these asymptotic regimes one observes a 
sharp threshold behavior: within some range of parameters p and g, the problem is statistically 
impossible, and outside of that range one can find algorithms that succeed with high probability. 
For partial recovery, this is established in |MNS14[ IMNS13t IMasl4| . and for exact recovery, the 
most general result on this threshold is established in [AS15a| . 

This paper concerns the case of exact recovery. Thus we take probabilities p = plogn/n, 
q = glogn/n, and we suppose that the vertex set is partitioned into r communities Si,..., 5^ of 
sizes Si = |Si|. We write vr* = Si/n, the proportion of vertices lying in community i. More precisely. 
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we only require Si/n ^ TTi as n ^ oo; for instance, each vertex could be independently randomly 
assigned to a community, with community i chosen with probability iTi. We hold r, p, q, and vrj 
constant as n —>• oo - these are the parameters for the problem ~ and we aim to exactly recover 
the communities Si (up to permutation) with high probability, by which we mean probability 
1 — o(l) as n —>■ oo, within as broad a range of parameters as possible. 

We specialize to the assortative planted partition model, where p > q', some techniques may 
transfer to the dissortative p < q case, by judicious negations, but we do not elaborate on this. 

There remains one more distinction to make: we have not yet specified whether the parameters 
r, p, q, and TTj are hidden or known to the recovery algorithm. We present algorithms for two cases: 
the known sizes case, where the community proportions vrj are known but p and q are potentially 
unknown; and the unknown sizes case, where the Hi are unknown but p, q, and the number r 
of communities are known. Even the case of fully unknown parameters is tractable |AS15b| . but 
there are fundamental barriers that prevent robust approaches such as ours from achieving this. 
We elaborate on this in Section [H 

1.3 Contributions and prior work 

The main result of this paper is that two variants of a certain SDP achieve exact recovery, up to the 
information-theoretic threshold, against the planted partition model with multiple communities of 
potentially different sizes. These SDPs are furthermore robust to monotone adversaries. 

There has been considerable prior work on the development of algorithms and lower bounds 
for the stochastic block model, with algorithms making use of a wide range of techniques; see for 
instance the introduction to [ABH16] and works cited there. The use of semidefinite programming 
for exact recovery originated with [FKr)l| . who achieved robust exact recovery in the case of two 
communities of equal sizes, falling slightly short of the optimal performance threshold. More 
recently, semidefinite algorithms have been found especially effective on sparse graphs [CSX121 
IABH16| . and have been proven to match lower bounds for the planted partition model in the case 
of two equal-sized communities [BanlSl IHWXIS^ . two different-sized communities |HWX15b] . 
and multiple equal-sized communities [HWXlhbl lABKKl^ . but the case of recovering multiple 
communities of different sizes through SDPs remained unresolved until now. The SDP that we 
consider in this paper has appeared before in the literature [GV15t IABKKP^ . and in fact Agarwal 
et al. |ABKK15] conjectured that it achieves exact recovery up to the threshold; the main result of 
this paper resolves this conjecture affirmatively. 

Abbe and Sandon |AS15a| established the information-theoretic threshold for the general stochas¬ 
tic block model with individual community-pair probabilities Qij, which we visit in Section [3l In 
addition to proving a sharp lower bound, they analyzed an algorithm that succeeds with high prob¬ 
ability up to their lower bound, thus precisely determining the statistical limits for exact recovery. 
Their result may appear strictly more general than ours, applying to the general stochastic block 
model rather than the more specific planted partition model. However, their algorithm involves 
a highly-tuned spectral clustering step that is likely not robust to monotone adversaries or other 
forms of perturbation (see Section |T] for discussion), and so our work can be seen as an improvement 
from the robustness standpoint. 

In fact, there are barriers against more general semirandom results. We show in Section [5] that 
no algorithm robust to monotone adversaries can handle the case of fully unknown parameters, nor 
can it extend from the planted partition model to the slightly more general strongly assortative 
block model. Thus our SDP-based approach is already essentially as general as one could hope for 
without compromising the strength of its robustness. 

Although we are primarily interested in the exact recovery regime, we will take a brief detour 
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and survey the parallel line of work on partial recovery. In contrast to exact recovery, none of the 
partial recovery algorithms that succeed down to the information-theoretic threshold are based on 
SDPs, and they have no robustness guarantees. There are robust SDP methods known |GV15[ 
IMPWlbl IMMV16] but they fall short of the threshold; this is essential, as it has been shown 
that no algorithm can robustly achieve the partial recovery threshold |MPW16] . Although robust 
algorithms for partial recovery cannot reach the threshold, there is an SDP that is conjectured to 
come quite close |JMR16| and is known to achieve the threshold in the limit of large average degree 
[MS16| : however, this analysis does not come with a robustness guarantee. By contrast, this paper 
shows that robust recovery is efficiently achievable up to the threshold in the exact recovery setting. 

Community detection in semirandom models was previously discussed in [FKOH ICSX14[ IAL14] , 
and further since the first appearance of this paper, in |MPW 16l IMMV 16| IHWXlba] . Other works 
|CL15[ IMMV16] discuss robustness to some quantity of corruptions or outlier nodes. 

1.4 Overview of techniques 

In this section we summarize the arguments by which we prove that our SDPs achieve exact 
recovery against the semirandom planted partition model. More precisely, we show that with 
high probability, the unique SDP optimum is a matrix that exactly encodes the correct community 
structure; in contrast to some SDP-based algorithms (both for community detection and otherwise), 
there is no ‘rounding’ or post-processing procedure required. 

The first step is to show robustness in the sense that if the SDP succeeds against a particular 
graph, then it will continue to succeed if that graph is modified by monotone changes. This follows 
from a simple argument in [FK01| which essentially observes that the SDP optimizes over a space 
of solutions, and monotone changes improve the objective value of the true solution more than they 
improve the objective value of any other solution. 

In light of the above argument, it suffices to show that our SDP succeeds with high probability 
against the random model. As in previous work on SDPs for exact recovery, our proof proceeds 
by constructing a dual certificate. The idea here is that the SDP is a maximization problem 
for which the true solution is feasible; what we need to show is that no other solution has a 
larger objective value than the true one, which can be done by finding a matching solution to 
the dual SDP. This “dual certificate” that we construct depends on the random graph and can 
be shown to be dual-feasible with high probability. As is typical, the construction of the dual 
certificate is guided by complementary slackness, which provides a set of necessary conditions. 
However, these necessary conditions are not enough to uniquely determine the dual certificate and 
so some creativity is necessary here in order to find an optimal certificate that gets all the way to 
the threshold. Part of what makes the general problem harder than the special cases considered 
previously [HWX15T)] is that the general case has more dual variables that are not automatically 
determined by complementary slackness but that nonetheless need to be chosen carefully. The 
crucial step in the construction of our dual certificate is that by making a change of variables 
we are able to find a connection between the complementary slackness conditions and the non¬ 
negativity of differences of certain binomial random variables, which in turn are closely related to 
the information-theoretic threshold. It then becomes clear which parts of the dual solution need to 
crucially be set a particular way, and which ones have “wiggle room.” As is typical, showing that 
our dual matrix is positive semidefinite relies in part on the spectral concentration of the adjacency 
matrix, e.g. Theorem 5.2 in [LR15| . 
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1.5 Organization of this paper 

In Section[2]we present our two closely-related semidefinite programs for exact recovery. In Section[3] 
we discuss the information-theoretic threshold determined in |AS15aj . In Section [J] we show that 
our SDPs are robust to the semirandom model, and we give some impossibility results for robust 
algorithms. In Section [5] we prove our main result: our SDPs achieve exact recovery up to the 
information-theoretic threshold. 

2 Semidefinite algorithms and results 

In this section we derive semidefinite programs for exact recovery, by taking convex relaxations of 
maximum likelihood estimators. Throughout we will use the letters u, v for vertices and the letters 
i,j for communities. 

2.1 Maximum likelihood estimators 

Given an observed n-vertex graph, a natural statistical approach to recovering the community struc¬ 
ture is to compute a maximum-likelihood estimate (MLE). We begin by stating the log-likelihood 
of a candidate partition into communities: 

logT = ^ logp -t ^ logg 

U'^V U'^V 

same community diff community 

Here ~ denotes adjacency in the observed graph. We can represent a partition by its nxn partition 
matrix 

1 if M and v are in the same community, 

0 otherwise. 

In terms of X and the observed (0, l)-adjacency matrix A, we can write 

2 log C{X) = (A, X) log p+{A,J-X) log q 

+ {J - I - A,X)log{l - p) + {J - I - A,J - X) log(l-g) 

where I is the identity matrix, J is the all-ones matrix, and (•, •) denotes the Frobenius (entry-wise) 
inner product of matrices. Expanding, and discarding terms that do not depend on X, including 
{X,I)=n: 



U'/'V U'/'V 

same community diff community 


2 log C{X) + const = a{A, X) - X) 


= a{A,X) - 

i=l 


where 


a = log 


p(i - q) 

q{l-p)' 


/3 


log 


1 - g 

i — p 


The assumption p > q implies that a and f3 are positive. 

At this stage it is worth distinguishing the two cases of known and unknown sizes. In the first 
form, the block sizes Si are known; then the second term above is a constant, and the MLE amounts 
to a minimum fixed-sizes multisection problem, and is NP-hard for worst-case A. 
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Program 1 (Known sizes MLE). 

maximize (^4, X) 

over partition matrices X with community sizes si,..., 

In the second form of the problem, the block sizes Si are unknown (but r is known). Now the 
MLE requires knowledge of p and g, and the resulting regularized minimum multisection problem 
is also likely computationally hard in general. 

Program 2 (Unknown sizes MLE). 

maximize {A, X) — u){J, X) 

over all partition matrices X on r communities, 

where 

_ /3 _ log(i -q)- log(l - p) 

a \ogp + log(l — q) — log q — log(l — p) 

Although it is not obvious from the definition, we can think of w as a sort of average of p and 
q-, the proof is deferred to Appendix lAl 

Lemma 1. For all 0 < q < p < 1, we have q < to < p. 

In order to compute either MLE in polynomial time, we will pass to a relaxation and show that 
the computation succeeds with high probability. 

2.2 Semidefinite algorithms 

The seminal work of |GW95| began a successful history of semidefinite relaxations for cut problems. 
Proceeding roughly in this vein to write a convex relaxation for the true feasible region of partition 
matrices of given sizes, one might reasonably arrive at the following relaxation of the known-sizes 
MLE: 

Program 3 (Known sizes, primal, weak form). 

maximize {A, X) 
subject to (JiX) = 

i 

Vu — 1, 

X > 0, 

X ^ 0. 

Here X > 0 indicates that X is entry-wise nonnegative, and X ^ 0 indicates that X is positive 
semidefinite (and, in particular, symmetric). 

This SDP appears in [AL14] (under the name SDP-2’). Under the assumption of equal-sized 
communities, a stronger form involving row sum constraints appears in [AL14[lHWX15bllABKK15] . 
and the latter two papers find that this strengthening achieves exact recovery up to the information- 
theoretic lower bound in the case of equal-sized communities. 

In the case of unequal community sizes, it is more difficult to pursue this line of strengthening. 
The authors have analyzed the weaker SDP above for multiple unbalanced communities, and found 
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that it does achieve exact recovery within some parameter range in the logarithmic regime, but its 
threshold for exact recovery is strictly worse than the information-theoretic threshold. (We have 
opted to omit these proofs from this paper, in view of our stronger results on the programs below.) 

Instead, we revisit the somewhat arbitrary decision to encode the partition matrix with entries 
0 and 1. Indeed, SDPs for the unbalanced two-community case tend to use entries —1 and 1 
[FKOll IABH16| IHWX15T^ . with success up to the information-theoretic lower bound |HWX15b] . 
Some choices of entry values will result in a non-PSD partition matrix, so we opt for the choice 
of entries for which the partition matrix is only barely PSD: namely, we define the centered 
partition matrix 


{ 1 if u and v are in the same community, 
otherwise. 

Recall that r is the number of communities. This matrix is PSD as it is a Gram matrix: if we 
geometrically assign to each community a vector pointing to a different vertex of a centered regular 
simplex in the resulting Gram matrix is precisely the centered partition matrix. 

Aiming to recover this matrix, we reformulate our SDP as follows: 

Program 4 (Known sizes, primal, strong form). 


maximize 

(AX) 


subject to 

T 

{J,x) = ^_ 


Mu 

^UU — 1? 






A ^ 0. 



This SDP bears a strong similarity to classical SDPs for maximum r-cut [FJ94] . and [ABKKl^ 
conjectured that it achieves exact recovery for unbalanced multisection up to the information- 
theoretic lower bound. Our main result resolves this conjecture affirmatively. 

Other than the natural motivation of following classical approaches to r-cut problems, the 
change from partition matrices to centered partition matrices buys us something mathematically 
concrete: the intended primal solution now has rank r — 1 instead of rank r, which through com¬ 
plementary slackness will entail one less constraint on a candidate dual optimum. 

We can write down a very similar relaxation for the MLE in the case of unknown sizes but 
known p and q: 

Program 5 (Unknown sizes, primal, strong form). 

maximize (A, X) — u}{J, X) 
subject to Mu X^u = Ij 


A ^ 0. 


Here u is as defined in ([T]). Our main assertion is that these SDPs achieve exact recovery up to 
the lower bounds in [AS 15a] : 
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Theorem 2. Given input from the planted partition model with parameters (p, vr), Programs\^ 
(known sizes) and\Si (unknown sizes) recover the true centered partition matrix as the unique opti¬ 
mum, with probability 1 — o(l) over the random graph, within the information-theoretically feasible 
range of parameters described by Theorem\^ 

To reiterate, the known sizes SDP (Program S|) requires knowledge of the sum of squares of 
community sizes (equivalently number r of communities, but p and q can be 

unknown; the unknown sizes SDP (Program [5]) requires knowledge of p, q, and r (or at least tv and 
r), but the community proportions vr can be unknown. 

Program[5]can tolerate some mis-specification in the value of w, though this tolerance may shrink 
to zero as the problem parameters approach the information-theoretic threshold. This tolerance is 
implicit in the proof of Theorem [2j 


3 Lower bounds 


In this section we visit the general information-theoretic lower bounds established by |AS15aj . 
and we specialize them to the planted partition model, recovering bounds that directly generalize 
those of [HWX15li^ for the case of two unequal communities. Meanwhile, we recall operational 
interpretations of the lower bounds, which will be our main concrete handle on them. 

Consider the general stochastic block model (probabilities Qij) in the regime where community 
sizes are Si = TTjn and probabilities are Qij = Qij logn/n, with vr and Q constant as n ^ oo. The 
threshold for exact recovery is as follows: 

Theorem 3 f |AS15 a]). For i ^ j define the Chernoff—Hellinger divergence 

D+{i,j)'^= sup ^T^k{tQik + {l-t)Qjk-Q\kQ]k^)- (2) 

iG[0,l] V 

Then exact recovery is information-theoretically solvable if 


D+{i,j)>l. 


(3) 


Within this range, there exist efficient algorithms that succeed with probability 1 — o(l). 

Conversely, exact recovery is impossible if there exists a pair (i,j) with D^{i,j) < 1. Within 
this range, any estimator for the community structure (even knowing the model parameters) will 
fail with probability 1 — o(l). 

[AS 15a] also shows that the borderline case D+(z,j) = 1 remains solvable; for simplicity we 
neglect this case. 

As mentioned in |AS15aj . if we specialize as far as the planted partition model with two equal¬ 
sized communities, then the CH-divergence is obtained at t = and one recovers the threshold 
y/p - y/q> as seen in |ABH16j and other works. In Appendix iBl we compute the following 
generalization: 


Proposition 4. For the planted partition model with parameters (p, g, vr), the CH-divergence is 
given by 


D+{i,3) 


-Kiq + TTjP 


7 + log 


f(fjP 

y-n-iq 


TjTTj - TTj) -S7 \ 
T{TTj - TTi) -S 7/ ’ 


7 = 



- ITj^ -F ATTiTTjPq, 


p-q 

logp-\ogq 

8 


T = 

















Note that in our regime (logarithmic average degree), we have 

cj = rlogn/n + o(logn/n). (4) 

The divergence expression in Proposition U] strongly resembles the lower bound proven in 
[HWXlhb] for the case of two communities of different sizes. Indeed, in the notation of Lemma 2 
of [HWX15b] . we recognize our expression as 

g{Tri,Trj,p, q,T{-Kj - vTi)). 

From that lemma, the following operational definition of the CH-divergence is immediate for the 
planted partition model: 

Lemma 5. Let i ^ j be communities, and let v be a vertex in community i. Let E{v,j) denote the 
number of edges from v to vertices in community j. Suppose that T{n) = T(7rj — iTj) log n + o(log n). 
The probability of the tail event E{v,i) — E{v,j) < T{n) is 

By a naive union bound, when D^{i,j) > 1 for all pairs i,j, then we can assert with high 
probability that none of these tail events occur, over all vertices and communities. 

A similar operational interpretation is given directly in |AS15a] . phrased in terms of hypothesis 
testing between multivariate Poisson distributions. This result keeps more complete track of the 
o(l) term, so as to guarantee the union bound even when D^{i,j) = 1. 

Lastly we note the following monotonicity property of the divergence, with a proof deferred to 
Appendix O 

Proposition 6. In the planted partition model, the CH-divergence D+(i, j) is monotone increasing 
in TTj and nj (for any fixed p,q). 

Thus, when determining whether exact recovery is feasible in the planted partition model for 
some set of parameters, it suffices to check the CH-divergence between the two smallest communi¬ 
ties. 

4 Semirandom robustness and its consequences 

4.1 The semirandom model 

Let X be the ground truth partition of the vertices into communities. 

Definition 1. Following |FK01] we define a monotone adversary to be a process which takes as 
input a graph (for instance, a random graph sampled from the stochastic block model with ground 
truth X) and makes any number of monotone changes of the following types: 

• The adversary can add an edge between two vertices in the same community of X. 

• The adversary can remove an edge between two vertices in different communities of X. 

These monotone changes appear to only strengthen the presence of the true community structure 
X in the observed graph, yet they may destroy statistical properties of the random model. The 
semirandom model is designed to penalize brittle algorithms that over-rely on specific stochastic 
models. It does not intend to mimic any real-world adversarial scenario, but it does intend to model 
the inherent unpredictability of real-world data. 

It may help to consider examples of how such an adversary could break an algorithm: 
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• Many algorithms perform PCA on the adjacency matrix [LR15| . The adversary could plant a 
slightly denser sub-community structure in a community, thus splitting one cluster of vertices 
into several nearby sub-clusters in the PCA, and introducing doubt as to which granularity 
of clustering is appropriate. 

• An adversary could introduce a noise distribution that changes the shape of clusters of vertices 
in the PCA or spreads them out, resulting in either a failure to cluster correctly, or else a 
failure in subsequent steps of estimating parameters and improving the community structure 
(as in |AS15bp . 

These are extreme examples, but they correspond to realistic concerns: 

• Real community structure is sometimes hierarchical, e.g. tight friend groups within a larger 
social community. 

• Many real networks have hubs, or a degree distribution that is nowhere near Gaussian, so 
hypothesis tests designed for distributions of roughly Gaussian shape may have trouble gen¬ 
eralizing. 

4.2 Robustness 

In this section, we establish that our SDPs are robust to monotone adversaries. We first elaborate 
on the definitions discussed in the introduction. 

Definition 2. Suppose / is a (deterministic) algorithm for recovery in the stochastic block model, 
namely / takes in an adjacency matrix A and outputs a partition f(A) of the vertices. We say / 
is robust to monotone adversaries if: for any A such that f(A) = A, we have f{A') = X for 
any A! obtained from A via a sequence of monotone changes. 

We modify this definition slightly for SDPs in order to deal with the fact that an SDP may not 
have a unique optimum. By abuse of notation, let X also refer to the centered partition matrix 
corresponding to the partition X. 

Definition 3. Suppose Pa is a semidefinite program which depends on the adjacency matrix A 
(for instance. Program [4] or Program [5]). We say Pa is robust to monotone adversaries if: for 
any A such that X is the unique optimum to Pa, we have that X is the unique optimum to Pa' 
for any A' obtained from A via a sequence of monotone changes. 

SDPs tend to possess such robustness properties. We will now show that our SDPs are no 
exception, following roughly the same type of argument as [FKni| . 

Proposition 7. Programs\^ (known sizes) andO (unknown sizes) are robust to monotone adver¬ 
saries. 

Proof. Let Pa be either Program|4]or Program [5] (the proof is identical for both cases). Suppose the 
true centered partition matrix X is the unique optimum for Pa. By induction it is sufficient to show 
that X is the unique optimum for Pa' where A' is obtained from A via a single monotone change. 
Note that Pa and Pa' have the same feasible region because A only affects the objective function. 
Let Pa{X) denote the objective value of a candidate solution X for Pa, namely Pa{X) is {A,X) 
for Program 0] and {A,X) — uj{J,X) for Program [5l First consider the case where A' is obtained 
from A via a single monotone edge-addition step. Since the added edge lies within a community 
of X we have Pa>{X) = Pa{X) + 2. For any matrix X feasible for Pa (equivalently, feasible for 
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Pa')^ we have Pai{X) < Pa{X) + 2; this follows from X <1 (entry-wise), which is implied by the 
constraints X^y = 1 and X ^ 0. If X 7 ^ X we have Pa'{X) = Pa{X) + 2 > Pa{X) + 2 > Pa'{X) 
and so X is the unique optimum of Pa'- Similarly, for the case where A! is obtained from A via a 
single monotone edge-removal, we have Pa'{X) = Pa{X) + and Pa'(X) < Pa{X) + (using 
the constraint X > ^4) and the result follows. □ 

Recall that in the semirandom planted partition model, a random graph is generated according 
to the random (planted partition) model and then a monotone adversary is allowed to make mono¬ 
tone changes. Once we have established our main result (Theorem [2]) on the success of our SDPs 
against the random model, it is an immediate corollary of robustness that our SDPs also succeed 
against the semirandom model. 

Proposition 8. Programs^ (known sizes) and\^ (unknown sizes) aehieve exaet recovery against 
the semirandom planted partition model, with probability 1 — o(l), up to the information-theoretic 
threshold for the random model (given in Theorem\^. 

4.3 BM-ordering and strongly assortative block models 

Let SBM(n,( 5 , 7 r) and PPM(n,p, g, tt) denote the stochastic block model and planted partition 
model, respectively. Given input from a stochastic block model SBM(n, Q, tt), we can simulate 
certain other stochastic block models SBM(n, Q', vr), by adding edges within communities inde¬ 
pendently at random, and likewise removing edges between communities. Specifically, we can 
simulate any block model for which Qk > and Qk < for all communities i j] in this 
case, following [AL14] . we say that SBM(re, (5^^) dominates SBM(n, Qj^r) in block model ordering 
(BM-ordering). 

In the case when the original model is a planted partition model, this simulation step can be 
thought of as a specific monotone adversary. Block models that dominate a planted partition model 
will fall among the strongly assortative class: those for which Qa > Qj^ whenever j 7 ^ k, i.e. all 
intra-community probabilities exceed all inter-community probabilities. The following is immediate 
from Proposition El 

Proposition 9. Programs^^ and0 achieve exact recovery with high probability against any strongly 
assortative block model that dominates a planted partition model lying within the information- 
theoretieally feasible range. 

Extrapolating slightly, we can assert that this extension to the strongly assortative block model 
tends to be automatic for semidefinite approaches because they tend to be robust. This has been 
previously noted by |AL14] through essentially the same arguments. By taking p = minj Qa and 
q = maxj^kQjk, we could obtain from Proposition 0] a more explicit description of this sufficient 
condition for exact recovery. 

4.4 Difficulties with general block models 

Most natural SDPs tend to be robust to monotone adversaries. This strength of semidefinite 
approaches - their ability to adapt to other random models following BM-ordering - can be used 
to also reveal their limitations. We will show in this section that it is impossible for an algorithm 
robust to monotone changes to match the information-theoretic lower bound of |AS15a| in general, 
and even for strongly assortative block models. 
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As an example, define 


Q\ 


a 

b 

c + e \ 


f ® 

6 

c \ 

b 

a 


to 

II 

6 

a 


c + e 

c 

a 1 


V c 

c 

a 1 


For a snitable setting of e > 0 and a,b,c,'K, it is possible for SBM(n, Qi, vr) to be information- 
theoretically feasible for exact recovery while SBM(n, <52) ti") is not. The following values provide 
an explicit example: 

a = 31.4, 6=15, c = 10, e = 1, vr = (1/3,1/3,1/3). 

However, Qi dominates <52 in BM-ordering. As our SDPs cannot hope to recover against SBM(n, Q 2 , tt), 
it follows that they fail also against SBM(n, Qi, vr), even though this model is information-theoretically 
feasible. 

This example shows how monotone changes become subtly unhelpful in the strongly assortative 
block model: the first two communities become harder to distinguish under these monotone changes 
because their interactions with the third community become more similar. Arguably this makes the 
semirandom model inappropriate for such a general block model. Nonetheless, monotone robustness 
is a property of our SDPs, and also of all prior SDPs in the community detection literature (at least 
after minor strengthening), and so the limitations below apply at least to these specific SDPs. Thus 
we are able to learn about the limitations of these SDPs by studying their robustness properties. 

The argument above applies to any algorithm that is robust to the semirandom model; this 
means no robust algorithm can achieve the threshold. This motivates us to conjecture a different 
“monotone threshold” for general block models, which we believe captures the information-theoretic 
limits in the general semirandom model. Define the monotone divergence 

D+{i,j)= sup V Trk{tQik + i^-t)Qjk-QlkQ]k^)- 
ke{ij} 

Note that D^{i,j) is simply the value of D^{i,j) after setting Qik = Qjk for all k ^ this 

is a change in model that the monotone adversary can simulate (for instance set Qik = Qjk = 0 ), 
and it is in fact the best change-in-model that the adversary can simulate if it wants to decrease 
D^(i,j) as much as possible for a specific {i,j) pair. It follows that if D^{i,j) < 1 for some 
i ^ j then there does not exist a robust algorithm achieving exact recovery. We conjecture that 
conversely, if > 1 for all i 7 ^ j, and if the block model is furthermore weakly assortative 

{Qii > Qij for all i 7 ^ j), then there exists a robust algorithm achieving exact recovery against this 
block model. 


4.5 Difficulties with unknown parameters 

Non-semidefinite techniques in |AS15b] achieve exact recovery up to the threshold without knowing 
any of the model parameters. One might ask whether it is possible for a robust algorithm (such 
as our SDPs) to achieve this; we now argue that this is not possible in general even in the planted 
partition model. 

Consider for example a strongly assortative block model SBM(n, Q, vr), on four communities, 
where 

a b c c \ 

b a c c 

c c a b 

c c b a / 
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and a > b > c. This model may be simulated by a monotone adversary acting on either of the 
planted partition models PPM(n, a, b, vr) and PPM(n, b, c, n'), where vr' = {'Ki+'K 2 i Suppose 

that we had a robust algorithm for exact recovery in the planted partition model without knowing 
any of the parameters. For a suitable setting of a, 6 , c, tt, such an algorithm should be able to 
achieve exact recovery against the two planted partition models listed above. By robustness, it will 
still recover the same partition, with high probability, when presented with the strongly assortative 
block model SBM(n, Q, tt). But now we have a contradiction: the algorithm allegedly recovers both 
partitions (corresponding to vr and vr') with high probability. 

In effect, these two planted partition models have zero “monotone total variation distance”, 
though we do not formalize this notion here. It is necessary to know some model parameters in 
advance in order for robust algorithms to distinguish such models. A few approaches are available 
to overcome this drawback: 

• One could statistically estimate some or all of the parameters before running the SDP, as in 
Appendix B of |HWX15b] . However, this statistical approach relies on the specific random 
model and spoils our robustness guarantees. 

• One could try running the SDP several times on a range of possible input parameters, ignoring 
any returned solutions that are not partition matrices. A close reading of Section [5] reveals 
that, when running Program [5] (unknown sizes), mis-guessing the parameter cj by any 1 —o(l) 
factor does not affect whether one succeeds with high probability. 

This approach may return several valid solutions. In the example above, this approach will 
recover both of the given planted partition models, with high probability. In general this 
approach recovers the type of hierarchical community structure that the above example ex¬ 
hibits. 


5 Proof of exact recovery 

In this section we prove our main result (Theorem [2]) which states that our SDPs achieve exact 
recovery against the planted partition model, up to the information-theoretic limit. Specifically, 
we show that if the divergence condition ([3]) holds, then with high probability, the true centered 
partition matrix X is the unique optimum for our SDPs. The main idea of the proof is to construct 
a solution to the dual SDP in order to bound the value of the primal. 

5.1 Notation 

Recall that we use the letters u,v for vertices and the letters i,j for communities. We let 1 denote 
the all-ones vector, Ij denote the indicator vector of 5*, I denote the identity matrix, and J denote 
the all-ones matrix. When M is any matrix, MsiSj will denote the submatrix indexed by Si x Sj, 
and we abbreviate MsiSi by M 5 .. 

Let A be the adjacency matrix of the observed graph and write E{i,j) = A\j-, when i ^ j 

this is the number of edges between communities i and j, and when i = j this is twice the number 
of edges within community i. 

All asymptotic notation refers to the limit n —>■ 00 , with parameters p, g, and tt held fixed. 
Throughout, we say an event occurs “with high probability” if its probability is 1 — o(l). 
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5.2 Weak duality 

Following the standard dual certificate approach, we begin by writing down duality and com¬ 
plementary slackness for our SDP. This will lead us to a set of sufficient conditions, outlined in 
Proposition [10] below. 

We first write down the dual of Program [5) 

Program 6 (Dual, unknown sizes). 

minimize Vy-\ - 

r — 1 

V 

subject to A diag(i^) -|- ujJ — A — P ^ 0, 

P > 0 symmetric. 

Here the n-vector v and the n x n matrix P (both indexed by vertices) are dual variables, and 
Lo is as defined in ([1]). We can now state weak duality in this context: 

(A, X) - uj{J, X) = {A- coJ, X) = (diag(i/) - P - A, X) 

= Y,^,-{V,X)-{K,X) 

V 

= y,^v + - (r,x + -^J) - (A,X). 

r — 1 r — 1 

V 

This implies weak duality (A, X) — uj{J,X) < (the primal objective value is 

at most the dual objective value) because (P,X + > 0 (since P > 0, X -|- > 0) and 

(A,X) > 0 (since A ^ 0,X ^ 0). 

5.3 Complementary slackness 

From above we have the following complementary slackness conditions. If X is primal feasible and 
(i^, P) is dual feasible then X and (i^, P) have the same objective value if and only if (P, X+y^ J) = 0 
and (A, X) = 0. Since A and X are PSD, (A, X) = 0 is equivalent to AX = 0 (this can be shown 
using the rank-1 decomposition of PSD matrices), which in turn is equivalent to colspan(X) C 
ker(A). 

Although we have only considered Program [5| so far, everything we have done also applies to 
Program 01 The dual of Program 0| is identical to Program jG] except that oj is replaced by a dual 
variable, and there is a corresponding term in the objective. By deterministically choosing this 
dual variable to take the value a;, we arrive at a dual program with the same feasible region and 
complementary slackness conditions as Program [ 6 | From this point onward, the same arguments 
apply to both Programs 0] and [5j 

Let X be the true centered partition matrix with (1, yEiy) entries. The following proposition 
gives a sufficient condition for X to be the unique optimum for Programs 0] and EJ 

Proposition 10. Suppose there exists a dual solution (z^, P) satisfying: 

• A ^ 0, 

• Dg. = 0 for all i, 

• ^SiSj > 0 (entry-wise) for all i 7 ^ j, 
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• ker(A) = span{lj — 

Then X is the unique optimum for Programs^ and\^ (Here A is defined as A = diag(i/)+t<; J—A—F 
as in Program 0 ) 

Proof. The first three assumptions imply that F) is dual feasible. The column span of X is 
span{rlj —l}i = span{lj —so the fourth assumption is that colspan(X) = ker(A), which is one 
of our two complementary slackness conditions. The assumption F 5 . = 0 implies (F, X + = 0 

because X + is supported on the diagonal blocks. This is the other complementary slackness 
condition, so complementary slackness holds, certifying that X is primal optimal (z^, F) is dual 
optimal. 

To show uniqueness, suppose X is any optimal primal solution. By complementary slackness, 
colspan(X) C ker(A) = spanjlj — lj}ij and (F,X + = 0. Since TsiSj > 0, this means 

AiSiSj = foi’ 9-11 * 7 ^ j- I^ut since every column of X is in spanjlj — tj}ij, we must now have 

= J and so X = X. □ 

We note that Proposition [TO] is not novel in that all the arguments we have made so far are 
standard in the dual certificate approach. 

5.4 Construction of dual certificate — overview 

We now explore the space of dual certificates that will satisfy the conditions of Proposition (TO] so 
as to sound out how to construct such a certihcate. The main result of this section is to rewrite 
the problem in terms of a new set of variables 7 ^. We believe this change of variables is novel, 
and it is crucial to our approach because it will allow us to make a connection between between 
complementary slackness and certain differences of binomial variables that are closely related to 
the information-theoretic threshold (see Lemma [5]). 

The condition span{lj — Ij} C ker(A) is equivalent to 

Vn,Vz 7 ^j e([A(lj - Ij) = 0 . 

Flsing the definition A = diag(z^) -|- uiJ — A — F, this can be rewritten as the two equations 

\/i^j,yueSi i^u + ^^{si - Sj) - E{u,i) + E{u,j) + '^Tuv = 0 (5) 

vGSj 

and 

yi j,yu ^ Si,u ^ Sj U}{si- Sj)- E{ufi) + E{u,j) -'^Tuv+'^Tuv = ^- (6) 

v&Si vGSj 

We can disregard the equations dH) because they are implied by the equations ([5|) via subtraction. 


From (|5]) we have that, for any fixed u E Si, the quantity usj — E{u,j) 
independent of j (for j i). Hence let us define 

- must be 

7„ = wsj - £'(u, j) - ^ Vj / i for u E S'*. 

(7) 

veSj 


Rewrite ([5]) as 


Vu = E{u, i) - uiSi + ju for u E Si, 

( 8 ) 

and rewrite ([7|) as 


Ruj = ojSj - E{u,j) -'ju ioTU^Sj, 

(9) 
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where Ruj is shorthand for the row sum Yhv&Sj Since T is symmetric, Ruj must be equal to 
the column sum 'Ylv&Sj need for any i j, 

^ ^ ^uj ^ ^ 
uGSi 


or equivalently: 

[ujsj - E{u,j) - 7 „] = ^ [ojsi - E{v,i) - 7 ^], 

udSi 

or equivalently: 

ujSiSj - E{i,j) - ^ 7 „ = ujSiSj - E{i,j) - Y '^v, 

U^Si V^Sj 

or equivalently, there needs to exists a constant c such that 

^ 7 n = c Vi. (10) 

u&Si 

To recap, it remains to do the following. First choose c. Then choose 7 „ satisfying m- Defining 
Ruj by ([9]), we are now guaranteed that {Ruj}u£Si and {Rvi}v£Sj are valid row and column sums 
respectively for (for i ^ j). Then define by ([ 8 ]), which guarantees spanjlj — Ij} C ker(A). 

It remains to construct r 5 . 5 ^. explicitly from its row and column sums such that Ts^Sj > 0. It also 
remains to show A ^ 0 and ker(A) C spanjlj — 1 ^}. Note that we have not actually chosen any 
values for dual variables yet, other than what is required by complementary slackness; we have 
merely rewritten the complementary slackness conditions in terms of the new variables ju and Ruj ■ 

5.5 Intervals for 7 ^ 

In this section we find necessary bounds for 7 ^, which will guide our choice of these dual variables 
and of c. This is where the crucial connection between 'ju and the information-theoretic threshold 
will become apparent. Let v G Si. For a lower bound on 7 ^, we have that A ^ 0 implies A^^ > 0 
implies + 0 ; > 0 which by dS]) implies 7 ^ > uj{si — I) — E{v, i). For an upper bound, for any j / i 
we must have that Ts^Sj > 0 implies R^j > 0 which by Q implies 7 ^, < ojSj — E{v,j). Therefore, 
'ju must lie in the interval 


■jv E [u;(sj - 1) - E{v,i) , min ujsj - E{v,j)). 

iV* 

Our approach in choosing •^u will be to first make a preliminary guess 7 ^, and then add an adjustment 
term to ensure that (jlOp holds. In order to absorb this adjustment, we will aim for 7 ^, to lie in the 
slightly smaller interval 

'Tu E [ay , /3y] (11) 

where 


ay = uj{si — 1) — E{v,i) + ei and Py = min usj — E{v,j) — 62 for v E Si. 

iV* 

Here ei and €2 are small o(logn) error terms which we will choose later. 

The non-emptiness of these intervals is the crux of the proof, and provides the connection to 
the information-theoretic threshold: 
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Lemma 11. If the divergence condition ^ holds, then for all v, with high probability. 

Proof. For v £ Si, we have 


13 . 1 , - ay = min {{E{v,j) - E{v,i)) - u{sj - Si) - ei - € 2 ) 

= mm{E{v, j) — E{v,i)) — uj{sj — sf) — o(logn) 

'S' min(F^(r;, j) — E{v, i)) — T{TTj — vTi) logn — o(logn). 

By Lemma O for each v £ Si and all j ^ i, the probability of the tail event 

{E{v,j) - E{v,i)) - T{'Kj - 7 rj)logn - o(logn) < 0 

is Thus, when D^(i,j) > 1 for all pairs {i,j), we can take a union bound over all 

n{r — 1 ) such events, to find that I3y — Oy > 0 with probability 1 — o(l). □ 

By summing (fTT]l over all v £ Si (roughly following (fTOl) . although 7 ^ and 7 ^, are slightly 
different), we obtain a target interval for c: 

c£[ai , Pi] Vi ( 12 ) 

where 

Oi = Oy = usi{si - 1) - E{i, i) + SjCi 

v£Si 

A = V = V min [ujSj - E{v,j) - € 2 ]. 

^^' 7 ^^ 

veSi v^Si 

The endpoints of the interval (1121) for c will turn out to be highly concentrated near a pair of 
deterministic quantities, namely: 

ai = {uj - p)si{si - 1) + SiCi and Pi = (uj - q)siSminj^i - Si €2 (13) 

where = min^yi Sj. 

5.6 Choice of c and 7 ^ 

We have now made the key insight that c and 7 ^, must he in certain intervals in order to get all 
the way to the information-theoretic threshold. However, as long as we fulfill these requirements, 
we have some “wiggle room” in choosing the dual variables. We will make what seems to be the 
simplest choices. We can deterministically take 

C = ~ (Z)'Smin'S 2 ndmin) 

where Smin, S 2 ndmin are the sizes of the two smallest communities (which may be equal). Then for 
sufficiently large n we have, for all i, 

ai < 0 < c < Pi, 

using the definitions (fT3]) of d*. Pi along with the facts € 1,62 = o(logn) and q < cu < p (Lemma [I]). 
Our specihc choice of c is not crucial; we can in fact pick any deterministic 0 < c < provided 
that c = 0 (nlogn) and Pi — c = 0 (nlogn) for all i. 
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Recall that our goal is to choose each 7 ^ to lie in (or close to) the interval subject to 

the condition J2veSi Iv = c required by (fTOl) . To achieve this, define the deterministic quantity 

k, = |^G(0,1). (15) 

Pi - at 

Note that we expect c to lie roughly Hi fraction of the way through the interval [ai,/3i] (which is 
the sum over v £ Si of the intervals /3^]). Mirroring this, we make a rough initial choice 7 ^, (for 
V G Si) that is Ki fraction of the way through the interval 

7(, = (1 - Ki)a^ + Ki/3^ (16) 


However, these do not satisfy Yhv&Si 7 ^, = c on the nose - rather, there is some error that is on the 
order of the difference between Oj and dj. We thus introduce an additive correction term 5i chosen 
to guarantee YlveSi 7^ ~ v G Si, 


7v = lv + Si, 



Recall that our goal was for 7 ^ to lie within some o(logn) error from the interval [a^,/3^]. By 
construction we have 7 ^, G [a^, f3y] and so we will have succeeded if we can show 6 i = o(log n). This 
will be one of the goals of the next section. 


5.7 High-probability bounds for random variables 

In this section we establish bounds on various variables in the dual certificate that will hold with 
high probability 1 —o(l). We can treat the failure of these bounds as a low-probability failure event 
for the algorithm. 

First recall the following version of the Bernstein inequality: 

Lemma 12 (Bernstein inequality). If Xi,... ,Xk are independent zero-mean random variables with 
\Xi\ < 1 , then for any f > 0 , 


Pr 


Y,Xi>t 


< exp 


2 ^ 


EiVar[W] + it 


Note that by replacing Xi with —Xi we get the same bound for Pr [ J2i ^i ^ “^ ] • 

For each vertex v, let 

Ay = max\E{v,j)-E[E{v,j)]\ (17) 

j 

where j ranges over all communities, including that of v. Recall that for v ^ Sj, E{v,j) ~ 
Binom(sj,g') and so E[i7(u, j)] = sjq; and for v G Sj, E{v,j) ~ Binom(sj — l,p) and so E[Fi(u,j)] = 
{Si - l)p. 
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Our motivation for defining the quantity is its appearance in the following bounds: 


I3v 

iv 


Si 

h 

C 

Si 


= \p{si - 1) - E{v,i)\ < 


min [usj - E{v,j)] - {u - q)saim^^ 


(1 tii'jCXy -{- Kif3y 

Si 


< A^ 


< 


/ \ Olv . Pv C 

(1 - Ki) -h Ki - 

Si Si Si 


+ A^ 


- LAti 


( 18 ) 


where the third bound makes use of the first two. 

Toward bounding A^, note that we can apply Bernstein’s inequality fLemma 1121) to bound each 
E{v,j) — E,[E{v,j)], and take a union bound to obtain 

/ 

Pr[A^ > t] < 2rexp -j— 

ynp + gt 

Taking t = log n log log n and union bounding over all v, we see that, with high probability, 
A^ < log n log log n for all v. But this will not quite suffice for the bounds we need. Instead, 
taking t = logn/(loglogn)^, we see that A„ < logn/(loglogn)^ for most values of v, with a 
number of exceptions that, with high probability, does not exceed ; for these ex¬ 

ceptions, we fall back to the bound of log n log log n above. Above we have used the following 
consequence of Markov’s inequality: if there are n bad events, each occurring with probability < p, 
then Pr[at least k bad events occur] < 

For the sake of quickly abstracting away this two-tiered complication, we make the following 
three computations up front: 

A^ < log n log log n -|- n -— 

^ (log log n)^ 

= C>(n log n/(log log n)^), 

AI < 77,1-V(log log lQg2 77 (log log n)^ -|- U ^ . 

^ (log log n)^ 

= 0{nlog^ n/(loglogn)^), 

{Au -h Ay)‘^ < 8n ■ 77i-i/d°gi°g"')® log^ n (loglogn)^ 4n^ ^ 

(log log n)*! 

U^V V o o / 

= 0{n‘^ log^ n/(loglogn)‘^), 

where the sums range over all vertices u,v. Note that in each case, the non-exceptional vertices 
dominate the bound. (One can easily compare two terms in the above calculations by computing 
the logarithm of each.) 

Now we can show that 5i, the correction term from the previous section, is small. For any i we 


(19) 

( 20 ) 

( 21 ) 
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have 


ISil = 




v£Si 


( 22 ) 


{IS 

< 


vGSi 


-(C - E - 

rEA 


+ - ^ A, 


u&Si 


veSi 

C>(logn/(loglogn)^). 

We will be interested in defining the quantity = A„+|(5jl (where v G St) due to its appearance 
in the following bounds: 


7. = 7; + <Ji®-±0(A;), 
Ryj = ujsj - E{v,j) - 7^ 

= {uj- q)sj - ^ ±0(A'J. 

Si 


(23) 

(24) 


Using the identity {x + y)^ < 2(x^ + y^) along with ([2D|), (l2T]) and ([22]), we have with high 
probability 


^(A(,)2 = C>(n log^ n/(loglogn)^), 

V 

(25) 

?^/(log log n)"^). 

(26) 


U^V 


5.8 Bounds on and R^j 

We can now prove two key results that we will need later: with high probability, 


logn 
log log n 


Vu 


(27) 


and 

R^j > 0 Vj, Vu i Sj. (28) 

These results should not come as a surprise because they were more or less the motivation for 
defining the interval [q;^,/ 3^] for 7^ in the first place. Since the values lie on the diagonal of 
A, the bound on is important for proving A E 0 which we need for dual feasibility. Since R^j 
are the row sums of T, the bound on R^j is important for achieving Ts^Sj > 0 which we need for 
Proposition (To] The specific quantity is not critical - anything would suffice that is o(logn) 

yet large enough to dominate some error terms in a later calculation. 

In the previous section we showed (l2^ we showed |5j| = o(logn) with high probability, and so 
we can choose the error terms ei,e2 from the definition of (which, recall, are required to be 
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o(logn)) to absorb 6i. Specifically, let 


ei = max |5j| + w + 
i 


log n 
log log n 


62 = max|5j| + 1 = o(logn). 
i 


o(log n) 


Recall that by Lemma [TT] the intervals [a ^, Pv] are all nonempty with high probability, and by 


construction we have 7 ^, G 

(X'u 






Now we will show (1271): 

Vy 

^ logn 
' — log log i 

For 

n 

V G Si we 

have 

Vy 

= 

E{v,i) - 

LOSi 

+ 

Iv 



= 

E{v,i) - 

USi 

+ 

I'v + 



> 

E{v,i) - 

LOSi 

+ 

Oty + Si 



> 

E{v,i) - 

LOSi 

+ 

Lo{Si - 1 ) 

-E{v 


> 

log n 






log log n ’ 


using the choice of ei. 

Now we show (1281) : > 0. For v G Si and j ^ i we have 


Ryj = USj - E{v,j) - Jy 

= Lusj - E{v,j) -iy-6i 

> usj - E{v,j) - I3y - 5i 

= iosj — E{v,j) — min [iosk — E{v, k)] + 62 — Si 

> 1 > 0 


using the choice of 62 - 

5.9 Choice of F 

We have shown how to choose strictly positive row sums Ryj and column sums Ryi of TsiSj (for 
i 7 ^ j). There is still considerable freedom in choosing the individual entries, but we will make what 
seems to be the simplest choice: we take to be the unique rank-one matrix satisfying these 

row and column sums, namely 

/■p \ RujRvi 

V SiSj )uV rti 

J-ij 

where Rj is the total sum of all entries of , 

Tij = Ryj = ^ (29) 

uGSi 


We showed earlier that this last equality is guaranteed by dloD. As the row sums Ryj are all positive 
with high probability (l28|) . it follows that > 0 . 
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5.10 PSD calculation for A 

We have already shown that if we choose and R^j according to ([8]) and ([9]) respectively, then 
we have spanjlj — Ij} C ker(A). In order to show A ^ 0 with ker(A) = spanjlj — Ij}, we need to 
show 

x'^Ax >0 Vx ± span{lj — 1^}. 

The orthogonal complement of span{lj — Ij} is spanned by block 0-sum vectors Z = {z G : 
^vGSi Zv = 0 Vi} plus the additional vector y' = Y,i Let y = |||7|| = Ei ^'^i/\jYli jr- Fix 

X T spanjlj — 1^} with ||x|| = 1 and write x = fiy + ^1 — /3‘^z for /? E [0,1] and z G Z with 
Ijzll = 1. We have 

x^Ax = p^y'^Ay + 2|3^/T^^z'^Ay + (1 - Az. (30) 

We will bound the three terms in (|30p separately. In particular, we will show that (with high 
probability): 


y~^Ay =n(logn), 

|z'''Ay| = C>(logn/(loglogn)^), 
z~^ Az = n(logn/loglogn). 

Once we have this, we can (for sufficiently large n) rewrite (I30p as 

x'^Ax logn-2/3^1-/ 32 C 2 . +(1-/3^)C3- 

(log log nj^ log log n 

for some positive constants Ci,C' 2 ,C' 3 . For sufficiently large n we have 


log n 


(31) 


(Cilogn) C 3 


log n 
‘ log log n 


> C 2 


log n 


(log log re) ^ 


which implies x~^Ax > 0 for all /3 E [0,1], completing the proof that A A 0 with ker(A) = spanjl, — 
Ij}. It remains to show the three bounds in (|31l) . 

5.10.1 Compute y~^Ay 


y Ay' = y'^(diag(re) +ujJ - A- T)y' 


iz^yl +ur'^ -Y 


E{i,j) 


SiS 










SiS 


lOj 


(a) 1 , 2 E{i,j) E{i,j) c 


i * v€Si 


-- 1 . ■ SiSj . 


1 


- *) “ + c)+u!r^ -Y - 1) + cX] ~ 

^ S- S- SiSn 

i Z i I J 


= —cur 


= c 


+ c Y^ ~ + — cur (r — 1) + c Y^ — 

E} 


i 

2 


. ,. SiSj 
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where (a) expands Ylij '^ij using ([29]l .(| ^ . (fT0]l . and (b) expands YlveSi using (IHI) . (fTO|) . There¬ 
fore 

y'^^y = iTTiii y''^^y' = ^ X] ~ 

l|y Ir i Si 

Since in (fTT)l we chose c to be 0(nlogn), we have y~^Ay = 0(logn) as desired. 


5.10.2 Louver bound for z~^Ay 

For V E Si, 

EivJ) 


{Ay')v = — +ujr - 

c • ^ ^ 


\y 


S3 Sj 


= —{E{v, i) - ojSi -h 7 ^,) -h wr - ^ 

Si 

3 

1 


Eiv,j) 


'^—{ujsj - E{v,j) - 7 ^) 


3¥^i 


= 7.E- 

^ Si 


and so 




Let {Ay)^ denote the projection of the vector Ay onto the subspace Z. For v ^ Si we have 




veSi 


Now we have 


z-^Ay > -||(A!,)^|| = - Y. Eh. - 7 )\/E 7 

V i veSi * y i * 


2: EE^W. E 

V * veSi V * 


= —C>(logn/(loglogn)^ 


5.10.3 Lower bound for z'^Az 

Note that J, + pi and EF are block-constant and so the quadratic forms z~^ Jz, 2;'''(Ej4 + pl)z 
and z'''(Er)^ are zero for z ^ Z. Then 

z'^Az = ^~'"(diag(z^) -|- ojJ — A — T)z 

= z~^ diag(i/)z - z~’'(A- ¥.A)z + pz~^ I z - (F - EFjz 
> minr'.u + p — \\A — E^|| — ||r — Er||. 

V 

Earlier we showed (I27p 

min > log n/ log log n 

V 

with high probability and we have p = 0(^^^) = o(logre/loglogre). It remains to bound ||T —ET|| 
and ||r — EF||. The next two sections show that each of these two terms is o(logre/loglogre) with 
high probability. It then follows that z'^Az = ^(logn/loglogn), as desired. 
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5.10.4 Upper bound for HA —EA|| 

Strong bounds for the spectral norm ||A —EA|| have already appeared in the block model literature. 
Specifically, Theorem 5.2 of |LR15j is plenty stronger than we need; it follows immediately that 

IIA — EA|| < 0{-\/\ogn) = o(logn/loglogn). 


5.10.5 Upper bound for ||r — Er|| 

Recall that Ts^Sj has row sums Ruj and total sum (of all entries) 

Tij = 2 ^ Ruj = ojSiSj - - c. 

ueSi 


By applying Bernstein’s inequality lLemma ll2l) to E{i,j) we get a high-probability bound for 


iij- 


Tij = ujSiSj - E{i,j) - c 

= {td — q)siSj — c ± 0 {^/nlog n). 


(33) 


We now compute ET. This is block-constant, by symmetry under permuting vertices within 
each community, and this constant must be 


E[r„.| = 4_E[r.,] = h - 


SiS 




SiS 




where u G Si, v G Sj, i ^ j. 
We can now compute 


r _ ft — 

UV UV — 




SiS 


lOj 


s-iSj R-iij Ryi ((^ Q'jsiSj cjTi 




SiSj Tij 

(^) Tg>(n^logn(A(,-F A(,)-kn^A(,A;) 

SiSj{{L 0 - q)siSj - c) -I- o{n^ logn) 

(J EO{n^ log n{A'^ + A'J + n^A'^A'J 

0(n^ logn) 

= E0{{A'^ +A'^)/n +A'^A'^/{nlogn)), 


where in step (a), we appeal to the bounds (121]) . (p3]l . causing cancellations in the high-order terms; 
and in (b) we have used the choice of c (|14l) to check that the denominator is 0(n^logn). 

We will bound the spectral norm of T — ET by its Frobenius norm. While this bound is often 
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weak, it will suffice here as F has constant rank, so that we only expect to lose a constant factor. 


|r-Er|| < ||r-Er| 


lY^ - EF^ 




' n n log n 


(a) 

< 


\ u,v ^ 


\ 


Vo ( 

V nlogn 


< -lyo (A'„+a;)2 + —^, 

nlogn\ 




= 0(logn/(loglogn)^) = o(log n/log log n). 


where (a) uses the triangle inequality for the Euclidean norm and (b) uses the high-probability 
bounds (|25|) . (l26]) for expressions involving A(,. 

This completes the proof that A ^ 0 with ker(A) = spanjl, — 1^}. We have now satisfied all 
conditions of Proposition 1101 and so we may conclude Theorem [2l Programs 0] and [5] achieve exact 
recovery with probability 1 — o(l). 
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A Proof of Lemma [1] 

In this appendix we verify, for all 0 < g < p < 1, that q < u < p, where 

_log(l -q)- log(l - p) _ 

logp -logq-\- log(l - q) - log(l - p)' 


CO = 


The proof is an elementary computation using the bound: < logx < x — 1 for all x > 0, with 

both inequalities strict unless x = 1. 

For the lower bound, we proceed as follows: 


1 


= 1 + 


log^ 


(U 1 ^rr 1_^ 




< 1 + 


p 

?-l 




and for the upper bound, we proceed similarly: 


= 1 + 


log^ 


lO 


logoff 


1-p 

> 1 + p/'l = - 
i^-1 p’ 

1—p ^ 


so that the result follows by taking reciprocals. 
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B Proof of Proposition [4] 


In this appendix, we establish a closed form for the CH-divergence in the planted partition model. 
The CH-divergence is defined in |AS15a| as 

D+{i,j)= sup + 

«e[o,i] 

summing over all communities k including i and j. (The limits on t are unimportant: one can show 
that the supremum over t E M always lies in [0,1].) 

Note that if Qik = Qjk then the k term of this sum vanishes. In the planted partition model, 
we have Qa = Qjj = p, Qij = Qji = q, and Qik = q = Qjk for all other k. In particular, only the i 
and j terms of the sum will contribute. Thus: 

D+{i,j) = sup tpTTi + (1 - t)qTri + tqnj + (1 - t)p7rj - TTip^q^~^ - njp^~^q^ 

t 

= sup t{p - q){7ri -7rj)+ Triq{l - {p/qf) + -Kjpil - {p/q)~^). 

t 

Substituting u = tj(logp — log q), we obtain 


D+{i,j) = sup UT{TTi - TTj) 7riq{l - e“) 7rjp{l - e “) 

u>0 

where r is as defined in Proposition [H 

As the supremand is smooth and concave in u, we can set the derivative in u equal to zero in 
order to maximize it: 

0 = r(7ri - TTj) — TTiqe^ + ■Kjpe~‘^, 

which is quadratic in e“, and can be solved via the quadratic formula: 

u _ r(7ri - TTj) -F - -Kj^ + ^■Kj'Kjpq 

2t: iq 


To obtain the simplest possible form for T)_|_, it is worth also solving for e “ 


- TTjy + A-KjTTjPq 

2'KjP 

Dividing these two expressions, we obtain 

^2u ^ Tj-^i - TTj) + 7 
-Kiq T{-Kj - TTi) 7’ 


where 7 = + ‘iTTiiTjpq. We now express u as half the log of this quantity. 

Substituting back into the divergence, we obtain 


D+{i,j) 


TTiq + TTjp - 7 + -r(7ri 



TjTTi - TTj) +j \ 
T{TTj - VTi) 7/ ’ 


thus proving the proposition. 


28 











C Proof of Proposition [6] 

In Proposition m we found that D^{i,j) = r](p,q,TTi,7rj) for a certain explicit function rj. We wish 
to see that rj is monotone increasing in its third and fourth parameters. This implies, for example, 
that when checking whether exact recovery is possible in the planted partition model, it suffices to 
check that the divergence is at least 1 between the two smallest communities. 

Note that 77 ( 0 , b, ac, ad) = ar]{a, b, c, d), and that rj{a, b, c, d) = r/(a, 6 , d, c), so it suffices to show 
that 7](p,q,s,l) is monotone in s. 

As rj is smooth, we will show that > 0. We will show this, in turn, by showing that 
lim 5 _ 5 .o -§^r} > 0 and that ^77 > 0 . 


We first compute 
d 


ds 


viP,Q,s, 1 ) = 


2s 


2qs — — 1)2 + 4pqs 

^ p —t{s — 1 ) + y/r 2 (s — 1)2 + 


+r 1 — s + s log 


52 r 

■^r]{p,q,s,l) = 


qs t(s — 1 ) + — 1 )^ + 4,pqs J 


■^(1 + s)^ + 2pqs — t (1 + s)i/r^(s — 1)^ + 4pqs 
2s^^yT^(s — 1)2 + 4pqs 


• To see that the second partial is non-negative, it suffices to see that the numerator is non¬ 
negative: 

7 

r^(l -|- s)^ -|- 2pqs > r(l -|- s) 

Both sides are non-negative, so it is equivalent to compare their squares: 

T^(l -|- -|- 4pgsr^(l -|- s)^ -|- > T^(l -|- s)^(r^(s — 1 )^ -|- Apqs) 

which is evidently true when s > 0 . 

• To see that the limit lim^^o is non-negative, we first compute it: 


d -pq plogf - glogf 

hm —77 =-^-^-:—, 

s^o os T logp — fog q 

We now divide through through by q, and set a = | and /3 = |, noting that 1 < /3 < a: 


\ lini ^77 = a{l-^) + /3 log ^ 
q s->o os p p 

= (a-«(l-t)>0. 


This proves the proposition. 
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